74 research outputs found
A Probabilistic Model for the Cold-Start Problem in Rating Prediction using Click Data
One of the most efficient methods in collaborative filtering is matrix
factorization, which finds the latent vector representations of users and items
based on the ratings of users to items. However, a matrix factorization based
algorithm suffers from the cold-start problem: it cannot find latent vectors
for items to which previous ratings are not available. This paper utilizes
click data, which can be collected in abundance, to address the cold-start
problem. We propose a probabilistic item embedding model that learns item
representations from click data, and a model named EMB-MF, that connects it
with a probabilistic matrix factorization for rating prediction. The
experiments on three real-world datasets demonstrate that the proposed model is
not only effective in recommending items with no previous ratings, but also
outperforms competing methods, especially when the data is very sparse.Comment: ICONIP 201
Exploring Scholarly Data by Semantic Query on Knowledge Graph Embedding Space
The trends of open science have enabled several open scholarly datasets which
include millions of papers and authors. Managing, exploring, and utilizing such
large and complicated datasets effectively are challenging. In recent years,
the knowledge graph has emerged as a universal data format for representing
knowledge about heterogeneous entities and their relationships. The knowledge
graph can be modeled by knowledge graph embedding methods, which represent
entities and relations as embedding vectors in semantic space, then model the
interactions between these embedding vectors. However, the semantic structures
in the knowledge graph embedding space are not well-studied, thus knowledge
graph embedding methods are usually only used for knowledge graph completion
but not data representation and analysis. In this paper, we propose to analyze
these semantic structures based on the well-studied word embedding space and
use them to support data exploration. We also define the semantic queries,
which are algebraic operations between the embedding vectors in the knowledge
graph embedding space, to solve queries such as similarity and analogy between
the entities on the original datasets. We then design a general framework for
data exploration by semantic queries and discuss the solution to some
traditional scholarly data exploration tasks. We also propose some new
interesting tasks that can be solved based on the uncanny semantic structures
of the embedding space.Comment: TPDL 2019; add appendix for the KG20C scholarly knowledge graph
benchmark datase
Multi-Partition Embedding Interaction with Block Term Format for Knowledge Graph Completion
Knowledge graph completion is an important task that aims to predict the
missing relational link between entities. Knowledge graph embedding methods
perform this task by representing entities and relations as embedding vectors
and modeling their interactions to compute the matching score of each triple.
Previous work has usually treated each embedding as a whole and has modeled the
interactions between these whole embeddings, potentially making the model
excessively expensive or requiring specially designed interaction mechanisms.
In this work, we propose the multi-partition embedding interaction (MEI) model
with block term format to systematically address this problem. MEI divides each
embedding into a multi-partition vector to efficiently restrict the
interactions. Each local interaction is modeled with the Tucker tensor format
and the full interaction is modeled with the block term tensor format, enabling
MEI to control the trade-off between expressiveness and computational cost,
learn the interaction mechanisms from data automatically, and achieve
state-of-the-art performance on the link prediction task. In addition, we
theoretically study the parameter efficiency problem and derive a simple
empirically verified criterion for optimal parameter trade-off. We also apply
the framework of MEI to provide a new generalized explanation for several
specially designed interaction mechanisms in previous models.Comment: ECAI 2020. Including state-of-the-art results for very small models
in appendi
MEIM: Multi-partition Embedding Interaction Beyond Block Term Format for Efficient and Expressive Link Prediction
Knowledge graph embedding aims to predict the missing relations between
entities in knowledge graphs. Tensor-decomposition-based models, such as
ComplEx, provide a good trade-off between efficiency and expressiveness, that
is crucial because of the large size of real world knowledge graphs. The recent
multi-partition embedding interaction (MEI) model subsumes these models by
using the block term tensor format and provides a systematic solution for the
trade-off. However, MEI has several drawbacks, some of which carried from its
subsumed tensor-decomposition-based models. In this paper, we address these
drawbacks and introduce the Multi-partition Embedding Interaction iMproved
beyond block term format (MEIM) model, with independent core tensor for
ensemble effects and soft orthogonality for max-rank mapping, in addition to
multi-partition embedding. MEIM improves expressiveness while still being
highly efficient, helping it to outperform strong baselines and achieve
state-of-the-art results on difficult link prediction benchmarks using fairly
small embedding sizes. The source code is released at
https://github.com/tranhungnghiep/MEIM-KGE.Comment: Accepted at the International Joint Conference on Artificial
Intelligence (IJCAI), 2022; add appendix with extra experiment
An End-to-End Multi-Task Learning Model for Image-based Table Recognition
Image-based table recognition is a challenging task due to the diversity of
table styles and the complexity of table structures. Most of the previous
methods focus on a non-end-to-end approach which divides the problem into two
separate sub-problems: table structure recognition; and cell-content
recognition and then attempts to solve each sub-problem independently using two
separate systems. In this paper, we propose an end-to-end multi-task learning
model for image-based table recognition. The proposed model consists of one
shared encoder, one shared decoder, and three separate decoders which are used
for learning three sub-tasks of table recognition: table structure recognition,
cell detection, and cell-content recognition. The whole system can be easily
trained and inferred in an end-to-end approach. In the experiments, we evaluate
the performance of the proposed model on two large-scale datasets: FinTabNet
and PubTabNet. The experiment results show that the proposed model outperforms
the state-of-the-art methods in all benchmark datasets.Comment: 10 pages, VISAPP2023. arXiv admin note: substantial text overlap with
arXiv:2303.0764
On the Trade-off between the Number of Nodes and the Number of Trees in a Random Forest
In this paper, we focus on the prediction phase of a random forest and study
the problem of representing a bag of decision trees using a smaller bag of
decision trees, where we only consider binary decision problems on the binary
domain and simple decision trees in which an internal node is limited to
querying the Boolean value of a single variable. As a main result, we show that
the majority function of variables can be represented by a bag of () decision trees each with polynomial size if is a constant, where
and must be odd (in order to avoid the tie break). We also show that a bag
of decision trees can be represented by a bag of decision trees each
with polynomial size if is a constant and a small classification error is
allowed. A related result on the -out-of- functions is presented too
- β¦